STA2453 Project #3

Due dates:

  • → Presentation on March 31, 10-12

  • → Final submission on April 10, 23:59

Questions

  • How can the data be used to understand desire lines i.e. where people come from and where they go?
  • How can we use the data to understand dwell time at 307? Where are areas of 307 that people pass through? Where do people tend to linger? Can we develop a map of “dwell time” at 307 and understand how it changes over time?
  • Does weather play a role in the number of visitors to 307? What are the impact of events (see calendar of events) at 307 on desire lines and dwell time? What are the most successful events at 307? Create a definition of "success" (most visitors, or visitors spending the most time) and see if that changes the result.
  • 307 requires maintenance after 500 visitors or 500 hours of use, when should 307 plan on scheduling maintenance operations (e.g., what days and times)?

The Assignment

  1. Create an interactive web page (e.g., dashboard) using libraries such as ipywidgets - Python only, Shiny - R only, or Dash - Python/R: dashR library; dash Python library that allows a user to answer the questions. The web page should be either a standalone web page or a Jupyter notebook that is meant to be displayed using voila. Your group is welcome to deploy the web page on a server, but it's not necessary.
  1. Create user documentation for your interactive web page that explains the data and statistics displayed on the page.

Groups

Students will be randomly assigned to groups.

In [1]:
import numpy as np

# set random seed to course number 
np.random.seed(2453) 

students = ['JB', 'CC', 'JZC','JC', 'SG', 'PL', 'JR', 'XT', 'SV', 'LW', 'JY']

print('There are',len(students),'students in the class. ' 
      'Randomly select four groups of two and one group of three.\n')

# student initals
students = np.array(students)

# 1. select 4 groups of size 2 without replacement
groups = np.random.choice(students, size=(4,2), replace=False)

# remaining students will be in 1 group of size 3
#
# 2. flatten list of students in a pair
groups2 = [item for sublist in groups.tolist() for item in sublist]

# 3. list of students that didn't get assigned to a pair
group3 = list(set(students).difference(set(groups2)))

# print out groups
for i in range(len(groups)):
    print('Group', i+1, 'is:', groups[i,0],'and',groups[i,1])

print('Group',len(groups)+1,'is:', group3[0],',',group3[1], 'and', group3[2])
There are 11 students in the class. Randomly select four groups of two and one group of three.

Group 1 is: CC and JC
Group 2 is: JZC and SG
Group 3 is: JY and JR
Group 4 is: PL and JB
Group 5 is: XT , LW and SV

Mandatory Project Workflow

  • Each student clones the assignment repository from Github to their local machine, and starts a unique branch to work on their part of the assignment.
  • Students work on their branches, committing changes and pushing their branch to the shared repository.
  • When each student finishes their part of the assignment, they start a pull request.
  • Each group works together to review the proposed changes, discuss improvements or alternatives, and resolve conflicting changes arising from concurrent development.
  • When the students agree on a resolution, they merge each pull request. The teaching team can leave feedback on commits or pull requests if they are tagged in the comments.
  • All members of the team should contribute equally to building the web page and documentation. It's not appropriate for one member to work on building the web page and the other to work on the documentation.

Git Tools Useful for working with Jupyter Notebooks

nbdime is a very useful Python library for working with .ipynb files with git and Github.

Presentation

Your presentation should demonstrate how your interactive Jupyter notebook answers the questions. Your presentation should use RISE - python to create slides, or xarigan - R. You may also demonstrate parts of your project using voila - python. The reason for using these presentation tools is so that your presentation slides are reproducible.

User Documentation

You and your partner will create user documentation for the web page. The documentation should be done


Presentation Expectations

The time allotted for each presentation is 10 minutes plus 5 minutes for questions/discussion (15 minutes for the group with three people). The time that each person speaks should be approximately equal (i.e., 5 minutes). This time limit will be enforced. If you exceed the time limit then you will be asked to stop the presentation. This means that you should rehearse your presentation timing before you present to the class.

General Presentation Guidelines

The goal of the presentation is to effectively communicate how librarians can use your web page to answer the questions (i.e., the communication is aimed at a non-technical, but educated, audience). This does not mean that you should not include technical details, but you should aim to communicate the findings to an audience without a background in statistics, math, or computer science.

You will need to remind us about the project, but only tell us what we really need to know. We are curious about the results, and how you present the results, but they are not the only purpose of this presentation. So, what should you include? Examples, of questions to consider as you prepare your presentation are:

  • What problem did your group set out to solve?
  • How did your group define the problem?
  • How will your results help librarians patterns of use among the UofT community for electronic journals?

Date of Presentation

The Jupyter notebook or R Markdown file that you used for the presentation should be pushed to your Github repository for this assignment by **March 31, 9:45**. Your presentation will be evaluated according to this rubric.


Evaluation

User Documentation for Interactive web page

  • The user documentation should explain to users what data is being displayed on your web page. For example, if you use the data to do a calculation or create a plot then explain why the calculation was done, and how it should be interpreted.

  • The documentation should be broken into sections that correspond to the sections of your web page.

  • The user documentation should be done using a Jupyter notebook/R markdown document. Ideally your group would find a way to incorporate the documentation into the design of the web page, although this isn't necessary.

How will my user documentation be evaluated?

Your user documentation will be evaluated for clarity and conciseness.

Titles [1-5]: There should be an appropriate title for each section of the web page.

Introductions [1-5]: What is the the purpose of each section?

Methods [1-5]: Statistical calculations and data visualizations should be clearly explained to users in each section of the web page without assuming a background in statistics, math, or computer science.

General Considerations [1-5]: The documentation should be presented in logical order, with well-organized sections, no grammatical, spelling, or punctuation errors, an appropriate level of technical detail, and be clear and easy to follow.

Workflow[1-5]: Groups should follow the project workflow by creating a branch for each member, pull requests, and merges using git and Github.

How will the web page be evaluated?

The web page be graded by evaluating:

  • Data analysis and programming be evaluated for appropriateness, readability, and reproducibility.
  • Data visualization and web page layout will be evaluated for:

    • clarity (can the data and figures by clearly seen and understood by the user?),

    • ease of use (is the web page easy to use? for example, is it easy to navigate?), and

    • communication (does the web page communicate appropriate responses to user queries?).

Date to Submit Final Web Page and User Documentation

The final Jupyter notebook or R Markdown file that you should be pushed to your Github repository for this assignment by **April 10, 23:59**.


Data

307 Calendar of Events

Coming soon ...

Data Access Using the Numina API

The 307 data can be accessed using the Numina API.

A few more items from Numina that may be helpful

  • A presentation given at Numina that shows how some of our data is structured and how Behavior Zones work on the backend. The relevant technical pieces start around 7:42.

Access data via the Numina Dashboard

  • Login to the dashboard and select a sensor.

  • Select mode.

  • Add a behviour zone (optional).

  • Select time frame.

  • Export CSV.

Steps to access the data using the API

  1. Setup your Numina login and password.

  1. The login and password that you setup using the dashboard can be used to access data via the API. The Numina API is a GrpahQL API. If this is your first time working with GraphQL then you can learn the fundamentals here, but we are really only going to be using some basic queries to fetch data.
  1. Once you have setup your account then you are ready to get a login token from the Numina API. It's good practice to keep your credentials confidential. So, store your login and password in another file, say login.py and use the magic %run to read your credentials into your Jupyter notebook.

The file login.py should contain:

login = "yourname@utoronto.ca"
pwd = "yourpassword"
In [2]:
# store login data in login.py
%run login.py
  1. A sample login query is below.
In [3]:
# login query as multiline formatted string
# this assumes that login and pwd are defined 
# above

loginquery = f"""
mutation {{
  logIn(
      email:\"{login}\",
      password:\"{pwd}\") {{
    jwt {{
      token
      exp
    }}
  }}
}}
"""
  1. A POST request can be issued to the server using the requests library.
In [4]:
import requests
url = 'https://api.numina.co/graphql'

mylogin = requests.post(url, json={'query': loginquery})
mylogin
Out[4]:
<Response [200]>

A login token was successfully returned by the Numina server (i.e., a response of 200 was returned). Now, store the token in token for use in subsequent queries.

In [5]:
token = mylogin.json()['data']['logIn']['jwt']['token']

Note that tokens expire after 24 hours by default.

In [6]:
expdate = mylogin.json()['data']['logIn']['jwt']['exp']
expdate
Out[6]:
'2020-02-25T21:32:57.370614'

Sample Queries

The following query requests all devices (sensors) serial number, and rawId that can be used as a unique way to identify the device in other requests.

In [7]:
query1 = """
query {
  devices {
    count
    edges {
      node {
        rawId
        name
        serialno
      }
    }
  }
}
"""

devices = requests.post(url, json={'query': query1}, headers = {'Authorization':token})
In [8]:
devices.json()
Out[8]:
{'data': {'devices': {'count': 3,
   'edges': [{'node': {'name': 'Streetscape - Sandbox',
      'rawId': '1b41b3eb5c254ea188c5bba172a89f76',
      'serialno': 'SWLSANDBOX1'}},
    {'node': {'name': 'Outside - Sandbox',
      'rawId': '29b315c428c54c77833d10822b429ded',
      'serialno': 'SWLSANDBOX3'}},
    {'node': {'name': 'Under Raincoat - Sandbox',
      'rawId': 'b0e5945bb2b14ad5977b138cd534c42e',
      'serialno': 'SWLSANDBOX2'}}]}}}

Counts

Counts queries are used to get counts of objects that were observed in a given time interval.

The following query finds the number of pedestrians detected daily by the indoor sensor (the sensor that has name Streetscape - Sandbox) from 2019-12-01 to 2019-12-31.

In [9]:
query2 = """
query {
  feedCountMetrics(
    serialnos:["SWLSANDBOX1"],
    startTime:"2019-12-01T00:00:00",
    endTime:"2020-01-01T00:00:00",
    objClasses:["pedestrian"],
    timezone:"America/New_York",
    interval:"24h") {
    edges {
      node {
        serialno
        result
        objClass
        time
      }
    }
  }
}
"""

dec2019peds = requests.post(url, json={'query': query2}, headers = {'Authorization':token})

Sample output from dec2019.json() of the daily pedestrian counts for December, 2019 is shown below:

{'data': {'feedCountMetrics': {'edges': [{'node': {'objClass': 'pedestrian',
      'result': 1.0,
      'serialno': 'SWLSANDBOX1',
      'time': '2019-12-01T00:00:00-05:00'}},
    {'node': {'objClass': 'pedestrian',
      'result': 69.0,
      'serialno': 'SWLSANDBOX1',
      'time': '2019-12-02T00:00:00-05:00'}},
      ...
      ...
In [10]:
query3 = """
query {
  feedHeatmaps(
    serialno: "SWLSANDBOX1",
    startTime:"2019-12-01T00:00:00",
    endTime:"2019-12-31T00:00:00",
    objClasses:["pedestrian"],
    timezone:"America/New_York") {
    edges {
      node {
        time
        objClass
        heatmap
      }
    }
  }
}
"""

dec2019heat = requests.post(url, json={'query': query3}, headers = {'Authorization':token})

Sample output from dec2019heat.json() of the daily pedestrian counts for December, 2019 is shown below:

{'data': {'feedHeatmaps': {'edges': [{'node': {'heatmap': [[495, 39, 0.192],
       [496, 39, 0.192],
       [497, 39, 0.192],
       [498, 39, 0.192],
       [508, 39, 0.192],
       [487, 40, 0.192],

Visualization of this data can be done in the Numina dashboard

or you can use a library such as OpenCV-Python (an R wrapper for open-CV is also available here).

Interactivity

This section contains sample code that can add interactivity to your web page. The dataframe raincoatdat contains data from one of the sensors at 307 from November 1 through December 31, 2019.

In [11]:
import pandas as pd
import numpy as np

raincoatdat = pd.read_csv('raincoatdatnovdec2019.csv')
raincoatdat.head(n=3)
Out[11]:
time pedestrians bicyclists cars buses trucks
0 2019-11-01T00:00:00-04:00 194 4 16 0 0
1 2019-11-02T00:00:00-04:00 83 0 0 0 0
2 2019-11-03T00:00:00-04:00 97 0 24 0 0

ipywidgets

ipywidgets can be used to create a pull-down menu that displays the data for a specific day during the time period.

In [12]:
import ipywidgets as widgets
from IPython.display import HTML

# dropdown menu of dates
dd = widgets.Dropdown(options = raincoatdat.time)

# Output widget for dataframe
out2 = widgets.Output()

# display dropdown and dataframe
display(dd, out2)

# dd_eventhand is an event handler for displaying a filtered view of the dataframe
# The callback registered must have the signature handler(change) where change is a 
# dictionary holding the information about the change.
# see https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html and 
# the doc string for observe (i.e., print(widgets.Widget.observe.__doc__))

def dd_eventhand(change):
    out2.clear_output() # clear current output
    with out2:
        # display three columns of dataframe filtered by date selected in dropdown
        display(HTML(raincoatdat[raincoatdat['time'] == change.new][list(raincoatdat)[1:5]].to_html(index=False)))

dd.observe(dd_eventhand, names = 'value')

plotly

plotly can be used to generate interactive plots and can also be combined with ipywidgets. The example below is taken from this example.

In [13]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(x=raincoatdat.time, y=raincoatdat['pedestrians'], name="Pedestrians",
                         line_color='#003f5c'))

fig.add_trace(go.Scatter(x=raincoatdat.time, y=raincoatdat['bicyclists'], name="Bicyclists",
                         line_color='#7a5195'))

fig.add_trace(go.Scatter(x=raincoatdat.time, y=raincoatdat['cars'], name="Cars",
                         line_color='#ef5675'))

fig.add_trace(go.Scatter(x=raincoatdat.time, y=raincoatdat['cars'], name="Buses",
                         line_color='#ffa600'))



fig.update_layout(title_text='Number of Objects Detected - Under Raincoat Sensor',
                  xaxis_rangeslider_visible=True)
fig.show()
In [ ]: